Chapter 1: Introduction

The topic of how to promote mathematics learning has concerned educators and policy makers for decades. The low math achievement of U.S. students is a matter of grave concern to educators, parents, and policy makers. A strong mathematical background is important for admission to many college majors, most professional occupations, and many computer-based technical occupations as well. More importantly, mathematics as the most basic pillar of STEM fields (science, technology, engineering, and math) is associated with the global competitiveness and economic leadership of a country. Additionally, it is surprising to know that high school math course taking predicts individuals’ future career outcomes: more high-level math courses taking, the better labor market outcomes. Rose and Betts’s (2004) study showed that high school students’ math achievement is a strong indicator of students’ earnings around 10 years later after controlling for demographic, family, school characteristics, highest educational degree acquired, college major, and career. These researchers also pointed out that more advanced math courses (e.g., algebra/geometry) have a stronger influence on students’ earnings than those less advanced ones. Similar results in James (2013) showed that among individuals who have the same level of education degree, those who take more math courses, especially advanced ones in high school, are more likely to obtain a job, and have higher salaries on average. These findings provide insights for policy makers to narrow the earnings gap among racial and socioeconomic groups by enriching the math curriculum and encouraging low-income parents and students to invest in more math courses in high school.

Chapter 2: Description of Data

2.1 Source Data

For the current research, data were drawn from the National Center of Education Statistics (NCES) High School Longitudinal Study of 2009 (HSLS: 09), which was the fifth in a series of longitudinal studies. HSLS:09 provides insights into students’ educational trajectories from the beginning of high school to the postsecondary options, to the work choices, and to early adult life. Particularly, it investigates the paths into and out of STEM fields as well as the educational and social experiences that affects these changes (Ingels et al., 2011). HSLS: 09 is a nationally representative study, which collected data across all 50 states and the District of Columbia. The HSLS:09 base year data were collected in the 2009-10 school year, which randomly sampled more than 23,000 ninth graders in 944 public and private schools (Ingels et al., 2011). The first follow-up wave took place in the Spring of 2012 when most sampled students were in their junior year (11th grade). This 2012 data were used in the current study because the 11th grade is a very important high school year for college admission, as well as for exploring the dynamics of educational and career decision-making. A 2013 postsecondary update provides the cohort’s college major choices and plans. A two-stage sampling process was used for data collection in the HSLS:09 study. In the first stage, 1,889 sampled schools were recruited from 50 states and the District of Columbia using stratified sampling, which finally resulted in a total of 944 schools. In the second stage, students were randomly sampled with more than twenty thousand participants. On average, about 28 ninth-grade students were selected from each participating schools. During the student selection stage, students were recruited by the student’s race/ethnicity (Hispanic, Asian, Black, and Other) specified by the school using stratum-specific sampling. Asian students were a little oversampled from each participating school (Ingels et al., 2011). During the year of 2012, student data were collected in 904 of the 939 high schools and included responses from a 35- minute questionnaire and a 40-minute online mathematics assessment. Parents, principals, mathematics and science teachers, and schools’ lead counselors also took surveys via web or phone.

2.2 data cleaning and data structure

The HSLS:09 dataset was sourced from National Center for Edcuation Statistics website.This data was retrieved and aggregated using SPSS prior to the start of this project.The dataset contains over 23,000 records of the students with student id, maths standardized theta score, student’s sex, race/ethnicity, GPA of subjects like english, science, social science and many more, student expectations, parents expectations. All rows contain missing data. However, a large number of nonresponse (or missing data) is often encountered in large-scale surveys (Madden et al., 2017).Since the missing values are unrelated to the variables of interest, complete case deletion (listwise deletion) was used in the current study to remove all data for a case that has one or more missing values. Some selected items have been reverse recoded in order to align with other scales and keep items semantically in the positive direction.

All variables, scales, and measures employed in this study were acquired from student and parent self-report survey data from the first follow-up year of 2012.The eleventh grade mathematics composite standardized theta scores on an Algebra test were used as the dependent variable in this study. It was a test to measure students’ algebraic skills, understanding, and math problem solving (Ingels et al., 2011). The theta scores represent normal distribution and could be utilized to measure students’ achievement growth over time when longitudinal data are available (Ingels et al., 2011). The index of X2SES is a scale of socio-economic status composite, which is acquired from parent self-reported data and calculated using guardians’ education, occupation, and family income. This composite variable is standardized and a higher value indicates a higher socioeconomic status. Three latent variables were created in the current study by using exploratory and confirmation factor analyses: parental involvement, student’s perception of teacher support, peer influence.

## Classes 'tbl_df', 'tbl' and 'data.frame':    23503 obs. of  47 variables:
##  $ STU_ID         : num  10001 10002 10003 10004 10005 ...
##  $ X1TXMTSCOR     : num  59.4 47.7 64.2 49.3 62.6 ...
##  $ X2SEX          : Factor w/ 2 levels "1","2": 1 2 2 2 1 2 2 1 1 2 ...
##  $ X2RACE         : num  8 8 3 8 8 8 8 8 8 8 ...
##  $ X2DUALLANG     : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 3 ...
##  $ X2TXMTSCOR     : num  68.6 54.1 55.6 NA 50.6 ...
##  $ X2MOMEDU       : Factor w/ 7 levels "1","2","3","4",..: 6 4 7 2 5 4 2 6 2 4 ...
##  $ X2DADEDU       : Factor w/ 7 levels "1","2","3","4",..: 6 2 NA 4 NA 5 2 NA 3 3 ...
##  $ X2FAMINCOME    : Factor w/ 13 levels "1","2","3","4",..: 11 3 6 3 7 5 4 7 6 5 ...
##  $ X2SES          : num  1.565 NA 1.014 NA 0.966 ...
##  $ X2SESQ5        : num  5 2 5 2 5 4 4 4 4 4 ...
##  $ X2BEHAVEIN     : num  NA 0.61 0.52 NA 1.21 0.89 0.39 0.61 0.34 0.7 ...
##  $ X2MEFFORT      : num  0.85 0.54 0.54 NA NA NA NA NA NA 0.87 ...
##  $ X2MTHID        : num  0.16 NA NA NA NA 1.82 0.12 0.7 0.7 0.16 ...
##  $ X2MTHEFF       : num  1.73 NA NA NA NA 0.32 NA 0.32 0.32 1.73 ...
##  $ X2STUEDEXPCT   : num  10 8 12 NA 9 10 8 8 6 12 ...
##  $ X2PAREDEXPCT   : num  10 10 12 9 12 10 12 8 13 13 ...
##  $ X2CONTROL      : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ X2LOCALE       : Factor w/ 4 levels "1","2","3","4": 4 4 2 2 1 3 4 3 4 2 ...
##  $ X2REGION       : Factor w/ 4 levels "1","2","3","4": 2 1 4 3 3 3 1 2 3 1 ...
##  $ X3TGPAENG      : num  2.5 4 2 4 3.5 3 2.5 3 3 3.5 ...
##  $ X3TGPAMAT      : num  3 4 2.5 3.5 2.5 3 2 3 3.5 4 ...
##  $ X3TGPASCI      : num  3 4 2.5 3.5 3 3 2 3 4 3.5 ...
##  $ X3TGPASOCST    : num  4 4 2.5 4 3.5 2.5 2.5 3.5 4 3.5 ...
##  $ X3TGPAART      : num  4 NA 4 4 3.5 4 3.5 4 NA 4 ...
##  $ X3TGPALANG     : num  3.5 4 3 4 3 3.5 NA 3.5 4 4 ...
##  $ X3TGPAHELPE    : num  3.5 4 4 4 3.5 3.5 3 4 4 4 ...
##  $ X3TGPACOMPSCI  : num  2.5 4 NA NA 3.5 4 NA 3.5 4 4 ...
##  $ X3TGPABUS      : num  NA NA NA NA NA 3.5 NA 4 NA NA ...
##  $ X3TGPAMISC     : num  3 4 4 NA 4 4 1.5 4 NA NA ...
##  $ X3TGPAACAD     : num  3.5 4 2.5 4 3.5 3 2.5 3.5 3.5 3.5 ...
##  $ X3TGPASTEM     : num  3 4 2.5 3.5 3 3 2 3 3.5 4 ...
##  $ X3TGPA11TH     : num  3.5 3.5 2.5 4 3.5 3 2 3.5 3.5 4 ...
##  $ X3TGPA9TH      : num  3.5 4 3.5 3.5 3.5 3 2.5 3.5 4 3.5 ...
##  $ X3TGPA10TH     : num  3.5 4 3 4 3.5 3.5 2.5 3.5 3.5 4 ...
##  $ X3TGPA12TH     : num  3.5 4 3 4 3.5 4 2.5 4 3.5 3.5 ...
##  $ X3TGPATOT      : num  3.5 4 3 4 3.5 3.5 2.5 3.5 3.5 4 ...
##  $ X3TGPAWGT      : num  3.5 4 3.5 4.5 3.5 3.5 2.5 3.5 4 4 ...
##  $ X3TAGPA10      : num  3 4 2.5 4 3 3.5 3 3.5 3.5 4 ...
##  $ X3TAGPA11      : num  3.5 3.5 2 4 3.5 2.5 1.5 3 3 4 ...
##  $ X3TAGPA12      : num  3.5 4 2.5 4 3 3.5 3 3 3.5 3.5 ...
##  $ X3TAGPA09      : num  3.5 4 3 3.5 3 2.5 2.5 3.5 4 3.5 ...
##  $ X3TAGPAWGT     : num  3.5 4 3 4.5 3.5 3 2.5 3.5 3.5 4 ...
##  $ Peer_influence : num  4.25 2.25 3.75 NA NA ...
##  $ Teacher_support: num  2.6 2.6 3.6 NA NA NA 1.8 NA 2.8 2.4 ...
##  $ Parents_Support: num  2.57 3.86 4 2.29 3.71 ...
##  $ X4ENTRYMAJ23   : Factor w/ 22 levels "1","2","3","4",..: NA 16 8 NA NA NA 14 15 3 NA ...

2.3 Test Nature of data

Our data set is longitudinal as mentioned before, there is a two-year record on the math score , the 9th grade math score and 11th grade math score. We first checked the distribution of the two-year math score.

The histogram shows that both 9th grade math score and 11th grade math score are normally distributed.
Then we generate a correlation table for the 9th grade math score, 11th grade math score and high school student final year total GPA.

From the table we can see that there is a postive correlation between the 11th grade student’s math score and the high school final year GPA. So we are going to use the 11th grade math score to do our following analysis.

Chapter 3:Descriptive analysis and statistic analysis

3.1 SMART question: Does gender matters in high school students’ math performance?

The boxplot shows the math score distribution for high school girls and boys . We can see the average score of the two groups are almost the same, only 0.2 differences. The standard deviation for girl’s group is 9.6582623, for boy’s group is 10.618878, which means that the math score spread for girl’s group is larger than the boy’s group.
Below we generate a two-sample t-test to confirm whether there is no difference in math score between the two groups.

Two-Sample T-test

H0: Girl’s and boy’s groups math mean score are the same.

H1: Girl’s and boy’s groups math mean score are not same.

## 
##  Welch Two Sample t-test
## 
## data:  gender_sub$X2TXMTSCOR by gender_sub$X2SEX
## t = 1.2905, df = 20468, p-value = 0.1969
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.09468586  0.45964482
## sample estimates:
## mean in group 1 mean in group 2 
##        51.59149        51.40901

The p-value 0.1969011 is larger than 0.05, so we fail to reject null hypothesis.So we can conclude that gender does not matter for student math performance. In other words, math is not to perceive as a “male subject”.

3.2 SMART question: Does the first language influence student’s math performance?

Student here are divided into three groups in which their mother language is English, non-English and Bilingual(English and non-English equally). The boxplots shows a slight increase from the left to the right. Below we conduct an ANOVA test to check the statistical result.

ANOVA test

H0:different first language group has the same mean in math score.

H1:different first language group has different mean in math score.

##                  Df  Sum Sq Mean Sq F value   Pr(>F)    
## df$X2DUALLANG     2    2744    1372   13.32 1.65e-06 ***
## Residuals     20588 2120049     103                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2912 observations deleted due to missingness

Since p < 0.05, we reject the null hypothesis and conclude that the mean of students’ math score is significantly different for at least one of the first language groups. Note that the ANOVA alone does not tell us specifically that which mean is different from one another. To determine that, we need to do multiple comparison (or post-hoc) tests.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = X2TXMTSCOR ~ df$X2DUALLANG, data = df)
## 
## $`df$X2DUALLANG`
##                          diff        lwr       upr     p adj
## non-English-English 0.3233422 -0.2237011 0.8703856 0.3485629
## Equally-English     1.4931409  0.8035256 2.1827562 0.0000011
## Equally-non-English 1.1697987  0.3277758 2.0118216 0.0032428

A post hoc test revealed that there is no statistically significant difference between the English and non-English group, as their p-value is 0.3485 > 0.05.
And we can conclude that students whose first language is English and non-English perform equally in math, and students from the bilingual group perform better than the other two groups.

3.3 SMART question:Does the students’ math score differ by school background?

We classified school background into 3 aspects: private and public, school location(city, suburb, town, rural),and school geograpical location(northeast, midwest, south, west).

The boxplot shows that private school students’ math average score is obviously higher than public school student. A t-test is conducted below to prove the result.

Two-Sample T-test

H0: Private school and public school student’ math average score are the same.

H1: Private school and public school student’ math average score are not same.

## 
##  Welch Two Sample t-test
## 
## data:  school_sub$X2TXMTSCOR by school_sub$X2CONTROL
## t = -29.323, df = 5202.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.401184 -4.724240
## sample estimates:
##  mean in group Public mean in group Private 
##              50.79497              55.85769

The t-test shows that the p-value is much less than 0.05. We reject the null hypothesis, and conclude that private school students do perform better than the public school student in math. The average math score of private school is about 5 points higher than the public school student’s math score.

The boxplot shows that there are differnces in math performance between different groups, and an ANOVA test is used to prove that.

ANOVA test

H0:different location groups has the same mean in math score.

H1:different location groups has different mean in math score.

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## X2LOCALE        3   23842    7947   78.77 <2e-16 ***
## Residuals   20125 2030473     101                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3374 observations deleted due to missingness

From the ANOVA test, since p-value < 0.05, we reject the null hypothesis and conclude that there must be at least one group of the school location in which students’ math average score is significantly different from the other groups.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = X2TXMTSCOR ~ X2LOCALE, data = df)
## 
## $X2LOCALE
##                    diff        lwr        upr     p adj
## Suburb-City  -0.7100989 -1.1862062 -0.2339915 0.0007358
## Town-City    -2.6001544 -3.2122538 -1.9880551 0.0000000
## Rural-City   -2.4666706 -2.9504271 -1.9829140 0.0000000
## Town-Suburb  -1.8900556 -2.4939702 -1.2861409 0.0000000
## Rural-Suburb -1.7565717 -2.2299297 -1.2832138 0.0000000
## Rural-Town    0.1334838 -0.4764793  0.7434470 0.9432092

The tukey test revealed that only town-rural pair’s p-value is larger than 0.05, so there is no statistically significant difference between the town and rural groups. But all the other pairs’ p-value are smaller than 0.05, which indicate they all have significant differences in math score. For conclusion, students studying in city perform the best in math, students in suburb perform the second, and students in rural and town are tied in the third place.

For geographical location, students are divided into 4 groups. From the boxplot, we can see that student math performance differ by different groups, but some groups seem to have a similar average math score. Below an ANOVA test is conducted to check in detail.

ANOVA test

H0:different geographical location groups has the same mean in math score.

H1:different geographical location groups has different mean in math score.

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## df$X2REGION     3   13409    4470   44.07 <2e-16 ***
## Residuals   20122 2040828     101                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3377 observations deleted due to missingness

Since p < 0.05, we reject the null hypothesis and conclude that the mean of students’ math score is significantly different for at least one of the school geographic region groups.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = X2TXMTSCOR ~ df$X2REGION, data = df)
## 
## $`df$X2REGION`
##                         diff         lwr        upr     p adj
## Midwest-Northeast -0.8825043 -1.46258720 -0.3024214 0.0005394
## South-Northeast   -2.2103005 -2.75095398 -1.6696470 0.0000000
## West-Northeast    -1.7275628 -2.36874326 -1.0863823 0.0000000
## South-Midwest     -1.3277962 -1.78197171 -0.8736207 0.0000000
## West-Midwest      -0.8450585 -1.41521777 -0.2748992 0.0008096
## West-South         0.4827377 -0.04725442  1.0127298 0.0891299

A post hoc test reveals that south-west region pair has equal math average score as the p-value is larger than 0.05, and all the other pairs have differences in math average score.
So a conclusion can be given: Students in northeast have the highest math average score, midwest students have the second high math average score, and students in west and south perform equally on the third rank.

The map-plot below shows the math score distribution of the four geograpical group. The darker the color the higher the math score.

Compare the map-plot with the USNEWS best college distribution map, we can see an apparent consistency between the two graphs. Better college means higher education, higher income, and higher socioeconomic status. And the students in these “higher” group perform better in math than other groups. As for whether there exists a correlation, we will discuss in the following paragraph.

3.5 SMART question: Peer, Teachers and Parents, which one affects students’ math score most?

Here, we are keen to look into the influence of external factors on students maths score achievement. We have categorized external factors in to 3 parts: “Peer influence” (classmates and fellows, their grades, plans for entrance tests, and admission in to 4 year college degree), “Teachers support” (teaching methodology), and “Parents support” (involvement and discussions for college entrance exams, and application submissions into career courses).

The above correlation plot shows that both peer and parents support has equal influence and are more correlated with student’s math score in comparison to teachers influence. However, to know which factor has most affect, Linear Regression is performed for more clear interpretation.

LINEAR REGRESSION

## 
## Call:
## lm(formula = X2TXMTSCOR ~ Peer_influence + Teacher_support + 
##     Parents_Support, data = df, x = TRUE)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.701  -6.109  -0.266   6.921  32.655 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      30.6635     0.9234  33.208   <2e-16 ***
## Peer_influence    2.7682     0.1852  14.945   <2e-16 ***
## Teacher_support   1.6573     0.1765   9.391   <2e-16 ***
## Parents_Support   2.5987     0.1805  14.397   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.646 on 6543 degrees of freedom
##   (16956 observations deleted due to missingness)
## Multiple R-squared:  0.08447,    Adjusted R-squared:  0.08405 
## F-statistic: 201.2 on 3 and 6543 DF,  p-value: < 2.2e-16

The multiple regression model with these THREE predictors (Peer, Teacher support, Parents support) at default 95% confidence interval has produced F(3, 6543) = 201.2, p < 0.05 significant value. As can be seen, all these three factors significantly influence students’ math achievment, however the students’ peer inflence had highest significant regression weights, after controlling for the other variables in the model.

3.6 SMART question:Which factor affects students’ math score most(SES, student math identity, other parties support)?

The Linear Regression model below is performed to find out the most influencing and affecting factor among SES (includes parents education, income and occupation), students math identity (students motivation factors like their performance, and efficiency of understanding maths), and other parties support (peer influence, teachers influence, parents support).

Model 1:

## 
## Call:
## lm(formula = X2TXMTSCOR ~ X2SES + X2MTHID + X2MTHEFF + Peer_influence + 
##     Teacher_support + Parents_Support, data = df, x = TRUE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -31.0964  -5.7800   0.3296   5.9819  21.5655 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     46.89029    1.90229  24.649  < 2e-16 ***
## X2SES            5.00195    0.39709  12.596  < 2e-16 ***
## X2MTHID          4.09127    0.36999  11.058  < 2e-16 ***
## X2MTHEFF         0.01673    0.36663   0.046    0.964    
## Peer_influence   2.14273    0.37022   5.788 8.47e-09 ***
## Teacher_support -0.16555    0.33862  -0.489    0.625    
## Parents_Support -0.60055    0.36937  -1.626    0.104    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.211 on 1710 degrees of freedom
##   (21786 observations deleted due to missingness)
## Multiple R-squared:  0.1917, Adjusted R-squared:  0.1889 
## F-statistic: 67.59 on 6 and 1710 DF,  p-value: < 2.2e-16
##           X2SES         X2MTHID        X2MTHEFF  Peer_influence 
##        1.072763        1.232563        1.290443        1.058720 
## Teacher_support Parents_Support 
##        1.076142        1.046133

The above multiple regression model with these predictors produced F(6, 1710) = 67.59 with some p-values p > 0.05 at default 95% confidence interval. So, we will try to predict another linear regression model, after controlling insignificant valued variables.

Model 2:

## 
## Call:
## lm(formula = X2TXMTSCOR ~ X2SES + X2MTHID + Peer_influence, data = df, 
##     x = TRUE)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.085  -5.591   0.314   6.152  24.956 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     42.8758     0.6500   65.96   <2e-16 ***
## X2SES            5.5290     0.2289   24.15   <2e-16 ***
## X2MTHID          4.0002     0.1855   21.56   <2e-16 ***
## Peer_influence   2.3056     0.1899   12.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.412 on 5967 degrees of freedom
##   (17532 observations deleted due to missingness)
## Multiple R-squared:  0.1916, Adjusted R-squared:  0.1912 
## F-statistic: 471.5 on 3 and 5967 DF,  p-value: < 2.2e-16
##          X2SES        X2MTHID Peer_influence 
##       1.029698       1.006743       1.031676

R-squared is a statistical measure of how close the data are to the fitted regression line. Our data is related to social science where data is not static so, R-squared = 0.1916 indicates that there is 19.16% of variance which is good value for our kind of data. Moreover, Adjusted R-squared value is also improved from 0.1889 to 0.1912 by controlling the less significant variables.The Variance Inflation Factor (VIF) < 10 is also good and indicates no multicollinearity.

As can be seen, the students’ SES background has highest & most significant regression weights and coefficient, after controlling for the other variables in the model. In additon, we can also see that the students’ math identity also has positive significant regression weights, indicating students math scores will increase when students increase their math identity, after controlling for the other variables in the model.

3.7 SMART question: Is there any correlation between final year Math GPA and other final courses GPA?

For this question we will find the relation between the final year Math GPA and other courses GPA variables through the correlation plot.For this, we will find out the correlation between 11 variables subject like science, business and foreign languages etc with final year math GPA of the student.

As we can see through the correlation coefficients in the corrplot, Math GPA is highly correlated with STEM (Science, technology, engineering, and mathematics) score which are the college deciding scores. The higher the math GPA, of the student the higher chance student, get a high total score. Also, the top 4 highly correlated courses to Math are Science, English, Social Science, and Foreign language. On the contrary, subjects like computer science or business have less correlation with the Maths GPA on the plot

3.8 SMART question: What is the most chosen major in college for students who are good at math in high school?

We will use the bar charts to find out the most chosen majors in college for students who are good in maths. We have taken Student ID, Student Maths Score and the Major student think they will consider after completing High school as the variables here. We have created 2 bar charts here- Graph 1 shows the most chosen major in college after high school & Graph 2 shows the major chosen by the student with the top 25 % of maths score.

As we can see from the bar charts the students in high school choose majors like Business, Healthcare field, Biological physical sciences, and Engineering in university when they pursue further education. Although when we see the data for students with top 25 % maths score through graph 2 the results show that biological sciences & engineering are the main majors selected by the students.

Chapter 4: Data limitation

The study is designed upon follow up questionnaire and interactions with the Students,Parents,Teachers ,Counselors,School Administrators.However, failure to answer those questionnaires or incorrect answering could lead to data inaccuracy or if there could be some missing entries in the dataset, it would impact overall analysis.

Chapter 5: Conclusion

In conclusion, dataset focuses on how students plan and make decisions about postsecondary options. There are many outcomes that we can derive from the analysis of the effects of different factors on high school student’s math achievement, and how does it affect their postsecondary study. We can observe through the test that gender does not affect the student’s math score and a student whose father or mother has a higher education level tends to have a higher math score in the high school. Father education level has a larger impact on students math score while language also makes a difference in the scores. A student whose family income level is higher tends to have a higher math score in high school. Besides, a student’s math score in private school is always higher than the student’s math score in public school. School locations correlate with student’s math performance. By comparing the different factors’, We can conclude that socioeconomic status, student’s own math motivation and peer’s support in math study affect the achievements of the students most. Student math achievement affect their future choices greatly.

Bibliography

Abadi.M and Kiersz.A(2018), Exactly which states are in the Northeast, Midwest, South, and West, according to the US government, Business insider. Access at https://www.businessinsider.com/united-states-regions-new-england-midwest-south-2018-4#and-west-south-central-includes-the-western-most-states-in-the-south-10

Boyington.B.(2018),, Where to Find the 2019 U.S. News Best Colleges,U.S.News. Access at https://www.usnews.com/education/best-colleges/articles/2018-09-11/where-to-find-the-2019-us-news-best-colleges

Chaudhry.M.(2015), College Success Starts In Math Class, Forbes. Access at https://www.forbes.com/sites/schoolboard/2015/05/08/college-success-starts-in-8th-grade-math-class/#3d775c17248e

Cvencek, D., Meltzoff, A. N., & Greenwald, A. G. (2011). Math–gender stereotypes in elementary school children. Child development.

Math milestones: The critical role of math achievement in student success, Renaissance. Access at https://www.renaissance.com/2018/03/22/blog-math-milestones-critical-role-math-achievement-student-success/